Equilibrium study of Protein/DNA Complexes from Crystal Structure Data 
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From the crystal structure data and using the concept of equilbrium statistical mechanics we 
show how to calculate the thermodynamics of protein/DNA complexes. We apply the method to 
the TATA-box binding protein (TBP)/TATA sequence complex. We have estimated the change in 
free energy and entropy for each of the base pair (bp). The local rigidity of the DNA is estimated 
from the curvature of the free energy. We also estimate the free energy gain of the protein due to 
bond formation with a particular bp. We thus figure out the bps responsible for specific binding. 

PACS numbers: 82.39.Pj, 87.15. Rn, 87.15.-v 
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All the cellular processes in life are controlled by two 
biopolymers, namely, protein and nucleic acid along with 
many smaller molecules like lipid, carbohydrate, wa- 
ter etc. via their specific and non-specific interactions. 
The aminoacids form the primary structure of a pro- 
tein molecule 0. On the other hand, the strutural unit 
of an antiparallel double helical deoxiribo nucleic acid 
(DNA) molecule [HQ, as shown in Fig. da), is the nu- 
cleotides consisting of 5-carbon neutral sugar (deoxyri- 
bose), nitrogen-containing purine (adenine, A and gua- 
nine, G) or pyriminine (thymine, T and cytosine, C) hy- 
drophobic bases attached to the sugar, the former in turn 
attached to a negatively charged phosphate group. The 
Watson-Crick base pairs (bp), namely A with T and G 
with C, remain at the core of the double helix with strong 
inter-basepair stacking, while the phosphates line up the 
periphery. Functional groups of the bases (amino, imino 
or keto) capable to form hydrogen bonds (H-bond) with 
functionally important protein molecules, are exposed to- 
wards the solvent within the two grooves: major or minor 
as shown in Fig.^b). 





FIG. 1: (color online) (a) A nucleotide with Watson-Crick 
base pairing, where Adenine paired with Thymine. X and Y 
axes are in the plane of the base pair and Z axis is perpen- 
dicular to the base pair plane. Y axis is vector joining CI' 
atoms, (b) The structure of a normal DNA. 



and (ii) translation, the synthesis of protein from RNA. 
For initiation of transcription, a transcription factor com- 
prising a class of protein including RNA polymerase 
needs to bind to the specific site of DNA (promoter) that 
encodes for the given protein (gene). Obviously, one ma- 
jor thrust in biochemical research aims at understanding 
protein/DNA interaction. Some of the issues of immense 
biochemical importance are: (i)Encrgctics involved in the 
complex formation; (ii) time scale related to different de- 
grees of freedom of DNA bp in the complex formation; 
(iii) kinetics of binding. Apart from pedagogical interest, 
these issues are extremly important application in drug 
designing, macromolecule recognition etc. 

In this paper we show how the x-ray crystal structure 
data can be utilized to understand the thermodynamics 
of protein/DNA complex at the level of bp, using the 
fundamental concepts of equilbrium statistical mechan- 
ics. We have build up from the crystallographic data 
both for complexed and the free state, the free-energy of 
deformation in six degrees of freedom of a bp in a given 
DNA sequence, treating the bp as a rigid plane. We can 
thus estimate the change in free energy 8F^ NA of the a- 
th bp upon the complex formation. We further identify 
from the crystallographic data the H-bonded atoms of a 
given protein residue with the given bp and estimate the 
energy of binding which is essentially the free energy gain 
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The protein synthesis in an organism is controlled via 
gene expression It consists of two parts: (i) transcrip- 
tion, the synthesis of ribo nucleic acid (RNA) from DNA 



upon complexation, if the complex formation is 
energy dominated, especially at low temperatures. One 
can thereby estimate the free energy gain of the protein, 
SFp rot upon complexation at the a-th bp by accounting 
for the accompanied ion and water release. 5F^ NA sheds 
some light on the time scale of DNA bp dynamics within 
the elastic approximation. We apply this analysis to the 
particular complex of TBP/TATA sequence DNA. The 
TBP/TATA box complex is one of the most important 
and well studied protein/DNA complex 0, H 0. 
Crystal structures of TBP/TATA box show that TBP 
binds through minor groove to severly deformedj^] con- 
sensus TATA sequence, namely, TATA(T/A)A(T/A)N, 
where the bases in one of the strands have been indicated, 
(T/A) being either thymine or adenine and N any of the 
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four bases. The functional groups in the minor groove 
are incapable to provide enough discriminatory H-bond 
partners to TBP for specific binding. An indirect mode 
of recognition has been proposed to explain TBP/TATA 
sequence specific binding. Here the DNA becomes struc- 
turally rigid in severely deformed conformation that al- 
lows the TBP to form adequate H-bonds. However, it 
has been found from the previous bioinformatics study 
that DNA becomes more flexible upon protein binding 
. Our analysis on TBP /TATA box complex sheds some 
light on this specific binding mode which is of profound 
biological interest. 

We have taken structures of the available TBP/TATA- 
sequence complexes solved by x-ray crystallography at 
a resolution better than 2.5^4° from the Protein Data 
Bank (PDB)[n| H3- The temperature of the selected 
complexes is in the range 100 — 120K and the pH ranges 
within 6.5 - 7.0. We have taken all such TBP/TATA 
sequence complexes as independent data set. For all the 
selected complexes TBP binds to a ten bp highly con- 
served (> 90%) DNA CTATAAAAGG sequence in one 
strand (5' to 3') along with the normal Watson-Crick 
base pairing in the opposite strand. We have taken two 
consecutive bps in 5' to 3' direction and construct a mean 
axis system with respect to which the bp geometrical pa- 
rameters are defined (l2T|. We define base normal, N a , as 
the vector normal to mean bp plane defined by all ring 
atoms of the a-th base, obtained from the coordinate of 
different atoms listed in the PDB files. We further take 
the bp normal, Z a , as the mean of N a of two paired 
bases, normalized to unit vector. The bp long axis, Y a , 
is the vector along the line joining CI' atoms of the bps, 
as shown in Fig.^b). The bp short axis, X a is the vec- 
tor normal to both Y a and Z a and pointing towards the 
major groove side of the bp. The base-base vector, M, is 
the vector joining the centers of two consecutive bps and 
mean doublet z-axis, Z m = ( ^ + ^ ) l x l ^ 1 +^ ) . The bp pa- 

\X 1 +X 2 \.\Yi+Y2\ 

rameters have been calculated using the relations: Tilt, 
t = — sin - (Z m .Xi); roll, p = sin - (Z m .Yi); twist, lj = 
cos- 1 [(A 7 ! x Z m ).(X 2 x Z m )\; Shift, Dx = M. jf+f j ; 

Slide, Dy = M. ^+t 2 } and Rise, Dz = M.Z m . 

We generate histogram for each of the local parame- 
ters of the TATA sequence DNA complcxed with TBP. 
FigEJa) shows typical histogram P(p) for p. There are 
three distinct sets of histogram: Set I shows the data for 
the first bp, set II shows data for the fifth bp and set III 
shows that for the second bp. In set I, the mean value 
of p lies within the range 0° — 4° which is similar to that 
known in the free case|l3|. So bps in set I has almost no 
roll deformation due to complexation. On the other hand 
in set III, where the mean value of p is within 48° — 52°, 
exhibits the maximum deformation due to complexation. 
The mean value of p in set II lies within 20° — 24°. Fig. 
EJb) shows the histograms, P(w) for u>. Here the mean 
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FIG. 2: (color online) (a) P(p)vs.p: Set I for the first base 
pair (fourth and ninth bp being similar). Set II for sixth bp 
steps (fifth and seventh being similar) and Set III for second 
bp steps (third, eighth bp similar trend), (b) P(o;)vs.tj: Set 
I for second and sixth base pair step (fifth, seventh bp being 
similar). Set II for first base pair step (third, fourth, eighth 
and nineth bp being similar). 



value of uj for the first bp is ~ 36° , comparable to mean u> 
in the free case^| . The mean value of u corresponding to 
the second and the sixth bp are 15° and 18° respectively, 
exhibiting large deformation in ui. The histogram for the 
r corresponding to all bp have the mean values similar to 
the known tilt value in free case (0.0° ± 0.5°)^j|, having 
insignificant effect of complexation. 

We further find the atoms of the protein residues form- 
ing H-bonds to the bases in a given bp along with the 
sugar ring and the phosphate groups attached onto it. 
Here H-bond analysis has been done with the help of 
pyrHBfind software ^4|- The binding region is primarily 
located between the fourth and the seventh bp. There 
are direct H-bonding at the fifth and the sixth bp be- 
tween base and amino acid residues, asparagine and tryp- 
tophan. Fig.OJa) shows the probability of finding H- 
bonds through sugar, phosphate and base in the vicinity 
of the a-th bp. Maximum number of H-bonding, three 
with the bases, two with the phosphate oxygen and one 
with the sugar, is observed at the vicinity of fifth bp. 
There is a substantial number of H-bonding through the 
sixth bp as well. Binding through sugar dominate the 
between fifth to seventh bp, the maximum being at the 
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sixth bp. The correlation between the deformation of 
DNA and the binding pattern is better revealed from the 
inset of Fig.|3fa), where the H-bonding energy £^0] be- 
tween the bound part of the bp and the protein residue, 
is plotted against bp. We find that the classical electro- 
static part has dominant contribution to Eg in all the 
cases. Eg has a funnel structure between the third and 
the seventh bps, having a minimum ~ —30 kcal/mole at 
the fifth bp. At the seventh bp, where protein residues do 
not directly interact with the base but interact with the 
backbone, Eg is relatively small, ~ —13 kcal/mole. Sim- 
ilarly, at the second and the nineth bp where the phos- 
phate oxygen interacts with polar amino acid residues Eg 
is ~ —12 kcal/mole. Thus the base binding contributes 
an energy ~ —15 kcal/mole at the fifth and the sixth bps. 
This additional energy can be defined as that due to the 
specificity of the base binding, the specificity being larger 
at the fifth bp. Note that the binding process makes the 
bps energetically inhomogeneous which is particularly re- 
markable for the fifth, sixth and the seventh bp, each of 
them being A. The Eg data corroborates to the defor- 
mation data in FigsQJa) and (b), namely, the P(p) and 
the P(w) data group together for the bps in the region 
of stronger binding. P(p) and P(u>) for the second and 
the eighth bp are distinct from the others, despite having 
relatively weaker Eg . Eg has weak local minima at these 
bps. This indicate that the protein binding may initiate 
at these two metastable points, having strong mechanical 
deformation. 

We calculate the free energy of deformation per com- 
plex at the a-th bp, /3F-* = — ln[P"], where i denotes 
any of the six bp parameter, (3 = l/fc^T, ks being the 
Boltzmann constant and T the temperature. Fig. Gtb) 
shows (3F a (p) for roll of the fifth, sixth and the seventh 
bp. f3F a (p) has a minimum having a fair degree of har- 
monicity, typical for an elastic degree of freedom 0], with 
deviations only for p values far away from the minimum. 
We find similar trend in all other free energy profiles. The 
curvature of the free energy at the minimum is a measure 
of the local rigidity corresponding to the bp parameter. 
The inset of Fig. Efb) shows the curvature at different 
bp for the rotational parameters, Cf, (i = p,T,u>). The 
Cg data show large local rigidity for the bp with large 
Eg . One can estimate the frequency of small oscillations 
about the cquilibirum from the curvature data by having 
the moment of inertia of a bp with respect to the rele- 
vant axis of rotation. The time period corresponding to 
the rotational parmeter ranges between 30-50 ps which 
is comparable to solvation time scale of water molecules 
in the vicinity of protein/DNA complex[l7j . 

We also estimate SF^ NA and 8F prot . To this end we 
selected the fifth and the sixth bp, where the base binding 
gives specific stability to the complex. We compare the 
results to these of the seventh bp, havin g n o H-bonding 
with base and the free AA bp doublet|li| . Fig. E^b) 
shows the free energy for the free case as well. The sec- 
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FIG. 3: (color online) (a) H-Binding probability P§ of sugar 
(□), phosphate(A) and base (o) with protein residues plotted 
against a. The joining lines are guide to the eyes. Inset: 
Binding energy, Eg vs a plot, (b) Free energy of deformation 
corresponding to p. Leftmost data (*) for the free case. Rest 
of the three for bound cases: seventh (A) , sixth (□) and the 
fifth (o) bp respectively. Lines are the best fitted curve. Inset: 
Cf corresponding to the three orientational bp parameters 
plotted against a: p (□), uj (A) and r (o). The joining lines 
are guide to the eyes. 



ond column of Table 1 shows that the change in curva- 
ture, in p, compared to the free case, is largest at the 
fifth bp. The difference in free energy minima between 
the complex and the free case gives SFg NA i for a th bp 



parameter so that 8Fg 



DNA 



DN AA 



The third col- 



umn of Table I shows SFg NA for a 



5, 6 and 7. We 
find that 8Fg )NA < for all a, indicating that after com- 
plexation the DNA goes to a thermodynamically more 
favoured state. The fourth column of Table I shows the 
loss of entropy of DNA estimated within the harmonic 
approximation. The a dependence in the entropy loss in- 
dicates that different amount of heat generated as the H- 
bonds are formed at different bp in the complex. The free 
energy gain upon complexation at the a-th bp is given 
by: SF? omp = 5Fg NA + 5F« ot + 5F« ater + 5F? on , where 
SF£ ater and 8F" on are the free energy cost for water and 
ion released respectively upon bond formation at the a-th 
bp. We take for low temperature SF" omp ~ Eg, ignoring 
the entropy effects due to the bond vibrations. However, 
the water and the ion releases are entropy driven pro- 
cesses. When TBP binds to the TATA sequence 19 wa- 
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ter molecules are released from the intcrfacial sufrace|l9|. 
We find that seven phosphate groups in the TATA se- 
quence are neutralized by seven positively charged amino 
acid residues. So we expect that upon complex forma- 
tion on an average 14 ions are released from the inter- 
fecial suface. The entropy gain (~ 6F™ ater + 5F^ on ) due 
to the displaced ions and water molecules is estimated to 
be - 38.29/e B Tj23|. Thus 5F£ rot can be estimated and 
shown in the last column of Table I where we observe 
that the free energy gain by the protein is maximum at 
the fifth bp. We thus find that the change in curvature 
and the maximum gain in thermodynamic free energy of 
different components in the TBP/TATA sequence com- 
plex are strongly correlated. The metastable complex 
through binding at the second and the eighth bp is sta- 
bilized by enhanced base binding at locally more rigid 
fifth bp. This mechanism holds the key to the indirect 
recognition of the TBP /TATA sequence complex. Earlier 
attempt might have missed enhanced local rigidity by 
taking average over all protein/DNA complexes. 



TABLE I: Thermodynamic data for the selected bp. 



a SFuna 


(a a - Gi" ree ) 




prot 


Fifth -0.23 


0.0025 


-5.7 


-48.9 


Sixth -0.25 


0.0021 


-5.47 


-41.26 


Seventh -0.23 


0.0003 


-4.77 


-17.12 



In summary we show here an approach based on 
the equilibrium statistical mechanics how crystal stru- 
crure data is used to obtain the bp-wise stability of a 
protein/DNA complex. Our analysis shows that the 
maximum specificity of the TBP/TATA sequence com- 
plex comes from H-bonding with the fifth bp where the 
changes in the rigidity are the maximum, though the 
binding may proceed from the second and the eigthth 
bp. Even though our analysis has been in the crytal 
phase, recent studies indicate that the enzymatic prop- 
erties of protein and nucleic acids remain intact in the 
crystal phase |2flj|. Hence, our predictions should hold 
even for TBP/TATA complex in the solution phase, per- 
tinent to the in-vivo situations. Our predictions may be 
verified by single molecule experiments where the H-bond 
forming abilty of different bp of the TATA-box sequence 
can be selectively altered by thio-substitution. We would 
point out that our analysis is quite general and can be 



applied to any protein/Nucleic acid complex. We shall 
report such detailed analysis in future publications. 
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